Current Issue : January - March Volume : 2014 Issue Number : 1 Articles : 5 Articles
Background: Comprehensive protein-protein interaction (PPI) maps are a powerful resource for uncovering the\r\nmolecular basis of genetic interactions and providing mechanistic insights. Over the past decade, high-throughput\r\nexperimental techniques have been developed to generate PPI maps at proteome scale, first using yeast two-hybrid\r\napproaches and more recently via affinity purification combined with mass spectrometry (AP-MS). Unfortunately, data\r\nfrom both protocols are prone to both high false positive and false negative rates. To address these issues, many\r\nmethods have been developed to post-process raw PPI data. However, with few exceptions, these methods only\r\nanalyze binary experimental data (in which each potential interaction tested is deemed either observed or\r\nunobserved), neglecting quantitative information available from AP-MS such as spectral counts.\r\nResults: We propose a novel method for incorporating quantitative information from AP-MS data into existing PPI\r\ninference methods that analyze binary interaction data. Our approach introduces a probabilistic framework that\r\nmodels the statistical noise inherent in observations of co-purifications. Using a sampling-based approach, we model\r\nthe uncertainty of interactions with low spectral counts by generating an ensemble of possible alternative\r\nexperimental outcomes. We then apply the existing method of choice to each alternative outcome and aggregate\r\nresults over the ensemble. We validate our approach on three recent AP-MS data sets and demonstrate performance\r\ncomparable to or better than state-of-the-art methods. Additionally, we provide an in-depth discussion comparing\r\nthe theoretical bases of existing approaches and identify common aspects that may be key to their performance.\r\nConclusions: Our sampling framework extends the existing body of work on PPI analysis using binary interaction\r\ndata to apply to the richer quantitative data now commonly available through AP-MS assays. This framework is quite\r\ngeneral, and many enhancements are likely possible. Fruitful future directions may include investigating more\r\nsophisticated schemes for converting spectral counts to probabilities and applying the framework to direct protein\r\ncomplex prediction methods....
Background: Existing tools to model cell growth curves do not offer a flexible integrative approach to manage large\r\ndatasets and automatically estimate parameters. Due to the increase of experimental time-series from microbiology\r\nand oncology, the need for a software that allows researchers to easily organize experimental data and\r\nsimultaneously extract relevant parameters in an efficient way is crucial.\r\nResults: BGFit provides a web-based unified platform, where a rich set of dynamic models can be fitted to\r\nexperimental time-series data, further allowing to efficiently manage the results in a structured and hierarchical way.\r\nThe data managing system allows to organize projects, experiments and measurements data and also to define teams\r\nwith different editing and viewing permission. Several dynamic and algebraic models are already implemented, such\r\nas polynomial regression, Gompertz, Baranyi, Logistic and Live Cell Fraction models and the user can add easily new\r\nmodels thus expanding current ones.\r\nConclusions: BGFit allows users to easily manage their data and models in an integrated way, even if they are not\r\nfamiliar with databases or existing computational tools for parameter estimation. BGFit is designed with a flexible\r\narchitecture that focus on extensibility and leverages free software with existing tools and methods, allowing to\r\ncompare and evaluate different data modeling techniques. The application is described in the context of bacterial and\r\ntumor cells growth data fitting, but it is also applicable to any type of two-dimensional data, e.g. physical chemistry\r\nand macroeconomic time series, being fully scalable to high number of projects, data and model complexity....
Background: The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved\r\noutcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide\r\ntools that allow the integration of biological knowledge embedded in the GO structure into different biological\r\nanalyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore\r\nthese different GO similarity measure approaches and their biological applications.\r\nResults: We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which\r\nincorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins\r\nwithin the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene\r\nOntology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user\r\nqueries.\r\nConclusions: The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity\r\nmeasures, including topology- and annotation-based approaches to facilitate effective exploration of these measures,\r\nthus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several\r\nbiological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO\r\nannotations, the clustering of functionally related genes within a set, and term enrichment analysis...
Background: Protein-protein docking, which aims to predict the structure of a protein-protein complex from its\r\nunbound components, remains an unresolved challenge in structural bioinformatics. An important step is the\r\nranking of docked poses using a scoring function, for which many methods have been developed. There is a need\r\nto explore the differences and commonalities of these methods with each other, as well as with functions\r\ndeveloped in the fields of molecular dynamics and homology modelling.\r\nResults: We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering\r\n118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%.\r\nHierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets\r\nof complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly\r\nscoring different complexes. This shows that functions in different clusters capture different aspects of binding and\r\nare likely to work together synergistically.\r\nConclusions: All functions designed specifically for docking perform well, indicating that functions are transferable\r\nbetween sampling methods. We also identify promising methods from the field of homology modelling. Further,\r\ndifferential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring.\r\nInvestigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a\r\nnumber of novel approaches, indicating promising augmentations of traditional scoring methods. Such\r\naugmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm...
Background: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume\r\nthat functional classes are organized hierarchically, that is, general functions include more specific ones. This has\r\nrecently motivated the development of several machine learning algorithms for gene function prediction that\r\nleverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to\r\nexploit relationships among examples, since it is plausible that related genes tend to share functional annotations.\r\nAlthough these relationships have been identified and extensively studied in the area of protein-protein interaction\r\n(PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction.\r\nRelations between genes introduce autocorrelation in functional annotations and violate the assumption that\r\ninstances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms.\r\nAlthough the explicit consideration of these relations brings additional complexity to the learning process, we expect\r\nsubstantial benefits in predictive accuracy of learned classifiers.\r\nResults: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in\r\nmulti-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in\r\nthe setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called\r\nNHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO\r\nannotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into\r\naccount improves the predictive performance of the learned models for predicting gene function.\r\nConclusions: Our newly developed method for HMC takes into account network information in the learning phase:\r\nWhen used for gene function prediction in the context of PPI networks, the explicit consideration of network\r\nautocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for\r\ndifferent gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved\r\nwhen the PPI network is dense and contains a large proportion of function-relevant interactions....
Loading....